Replication Package for 
"LinkedOut? A Field Experiment on Discrimination in Job Network Formation" 
by Yulia Evsyukova, Felix Rusche, Wladislaw Mill
=============================================

Overview
--------
This repository contains the replication files for the paper "LinkedOut? A Field 
Experiment on Discrimination in Job Network Formation".
The analysis is split across multiple files and folders, which are described below. 
Some processes are computationally intensive and have been separated to facilitate 
efficient replication. Key datasets and outputs are also provided.

File Structure
--------------
- LinkedOutReplication.Rnw:
    The main source file for replicating the analysis in the paper. This script contains 
    most of the analysis except for a few time-intensive sections.
  
- causal_forest_replication.R:
    This file is dedicated to the causal forest analysis. This step is computationally 
    expensive, so it is separated for better workflow management. All results from this 
    analysis are saved in the appropriate folder (/figureCausalForest).

- references.bib:
    This file contains all references used in the paper.

- /Pictures:
    Contains screenshots and the picture documentation.
    It also contains sample pictures of Black and White profiles.

- /boot:
    Stores the results of bootstrapped differences as part of the expert survey. Note that 
    these computations are commented out in LinkedOutReplication.Rnw due to their high 
    resource demand (OutDataCombined.Boot.Rel), but the saved results are provided in 
    this folder. To compile them uncomment the section and set eval=F to eval=T.

- /GPTCodedSurveyResponses:
    Contains datasets of GPT-coded LinkedIn survey responses 
    (e.g., gpt_DownsidesUnknown.rds).

- /figureCausalForest:
    Contains the output images and visualizations from the causal forest replication.

- /renv:
    Contains all the necessary R libraries required for the replication.

- renv.lock:
    Contains the R environment (renv.lock).

- /figure:
    Contains all figures produced by LinkedOutReplication.Rnw

- LinkedOutReplication.tex:
    This file is produced by LinkedOutReplication.Rnw, and is needed to compile the paper
    and to create the PDF.

- /ProfilePictures:
    Contains all the profile pictures created for the experiment. Picture names correspond 
    to the four StyleGAN2 images merged to obtain the output image (original file names 
    separated by "_"). Accordingly, images with more common "ancestors" look more similar.

- /DataForMap:
    Stores the data needed to replicate the figure "MapFelix-1". Note that 
    the figure is commented out in LinkedOutReplication.Rnw due to its high 
    resource demand (MapFelixPrep and MapFelix). To compile them uncomment 
    the section and set eval=F to eval=T.

Datasets
--------
- LinkedOutPreTests.csv:
    The original validation experiment dataset, conducted via MTurk and exported 
    from Qualtrics.

- ExpertSurvey.rds:
    An anonymized version of the expert survey dataset, with potentially personally 
    identifiable information such as meta data and emails removed.

- Prolific_LinkedIn_Survey.csv:
    The original LinkedIn survey data, conducted on Prolific and exported from 
    Qualtrics.

- experiment_edge_replication.rds:
    The main dataset containing information on all the targets and their decisions.

- experiment_profile_replication.rds:
    Contains information on the profiles and the profile-level aggregates.

- MessagesRas.rds:
    Contains the research assistant coding of the messages.

- daten.Space.Red.rds:
    A small dataset of state-level acceptance-gap differences merged with
    additional state information.

- experiment_MessagesRasWithAddAll_replication.rds:
    Contains message and target information. 

- experiment_profiledaily.rds:
    Profile-level data on a daily acceptance level from the LinkedIn experiment.

Instructions for Replication
----------------------------
1. Ensure that R and LaTeX are properly installed on your system.
2. Restore the R environment using the `renv` package. This can be done by running 
   `renv::restore()` to install the required libraries.
3. Run the `LinkedOutReplication.Rnw` file to replicate the main results of the paper. 
   Some parts of the analysis, such as the bootstrapped differences and the causal forest 
   analysis, are excluded from this script due to their intensive computation.
4. For the causal forest analysis, run the `causal_forest_replication.R` file. The outputs 
   will be saved in the `/figureCausalForest` folder.
5. The bootstrapped differences are provided in the `/boot` folder, so there is no need to 
   recompute them unless desired.

Notes
-----
- Some datasets (e.g., experiment_edge_replication.rds, 
  experiment_MessagesRasWithAddAll_replication.rds) have been anonymized as the original 
  data contained sensitive information. Specifically, we removed all columns that may 
  contain personally identifiable information, such as URLs, picture names, full names, 
  specific employers, etc. As part of our IRB approval, we ensured that the dataset is 
  fully anonymized, and we are therefore unable to share the original, non-anonymized data.

- The causal forest analysis and bootstrapped differences are computationally intensive. 
  Running these processes may take a significant amount of time depending on the system’s 
  specifications.
